A Parallel Data Mining Architecture for Massive Data Sets

نویسندگان

  • Felicity George
  • Arno Knobbe
چکیده

This paper discusses a parallel data mining architecture which provides the capability to mine massive data sets highly efficiently, scanning millions of rows of data per second. In this architecture the mining process is divided into two distinct components. A parallel server, Compaq’s Data Mining Server (DMS), provides a set of data mining primitives which are utilized by a data mining client, Syllogic’s DMT/MP, which implements the actual data mining algorithms. The parallel architecture and the primitives used to operate on the data will be discussed, and the mining algorithms’ use of these primitives. Performance figures will be presented for both the primitives and the high level mining algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TeraScope: distributed visual data mining of terascale data sets over photonic networks

TeraScope is a framework and a suite of tools for interactively browsing and visualizing large terascale data sets. Unique to TeraScope is its utilization of the Optiputer paradigm to treat distributed computer clusters as a single giant computer, where the dedicated optical networks that connect the clusters serve as the computer’s system bus. TeraScope explores one aspect of the Optiputer arc...

متن کامل

Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets Center for Parallel and Distributed Computing Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets

Clustering techniques are used in database mining for nding interesting patterns in high dimensional data. These are useful in various applications of knowledge discovery in databases. Some challenges in clustering for large data sets in terms of scalability, data distribution, understanding end-results, and sensitivity to input order, have received attention in the recent past. Recent approach...

متن کامل

A parallel method for computing rough set approximations

Massive data mining and knowledge discovery present a tremendous challenge with the data volume growing at an unprecedented rate. Rough set theory has been successfully applied in data mining. The lower and upper approximations are basic concepts in rough set theory. The effective computation of approximations is vital for improving the performance of data mining or other related tasks. The rec...

متن کامل

Parallel Wavelet Transform for Spatio-temporal Outlier Detection in Large Meteorological Data

This paper describes a state-of-the-art parallel data mining solution that employs wavelet analysis for scalable outlier detection in large complex spatio-temporal data. The algorithm has been implemented on multiprocessor architecture and evaluated on real-world meteorological data. Our solution on high-performance architecture can process massive and complex spatial data at reasonable time an...

متن کامل

Fast Parallel Mining of Frequent Itemsets

Association rule mining has become an essential data mining technique in various fields and the massive growth of the available data demands more and more computational power. To address this issue, it is necessary to study parallel implementations of such algorithms. In this paper, we propose a parallel approach to the Frequent Pattern Tree (FP-Tree) algorithm, which is a fast and popular tree...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999